-------------------------------------------------------------------------------
README: SADE Acoustic Model
-------------------------------------------------------------------------------

Full name:    South African Directory Enquiries (SADE) Acoustic Model

Description:  The Acoustic Model for the SADE application, used by the speech
              recognition component of the SADE application. 

Version:      1.0

Size:         19MB

URL:          http://rma.nwu.ac.za/
              Any queries with regard to updated versions can be directed to
              rma@nwu.ac.za.

-------------------------------------------------------------------------------

This data is shared under the Creative Commons Attribution 3.0 Unported 
(CC BY 3.0) license. For more information see license.txt

When using these prompts, please cite:
  Charl van Heerden, Marelie Davel and Etienne Barnard, "Performance analysis 
  of a multilingual directory enquiries application", in Proc. Annual Symp. 
  Pattern Recognition Association of South Africa (PRASA), pp 258-263, 
  Cape Town, South Africa, November 2014.

bibtex:
@inproceedings{vanheerden14dePerformanceAnalysis,
  author    = {Charl van Heerden, Marelie Davel and Etienne Barnard},
  title     = {Performance analysis of a multilingual directory enquiries
               application},
  booktitle = {Proc. Annual Symp. Pattern Recognition Association of South 
               Africa (PRASA)},
  pages     = {258--263},
  address   = {Cape Town, South Africa},
  year      = {2014},
  month     = {November}
}

-------------------------------------------------------------------------------

DETAILED INFORMATION

DESCRIPTION:

The SADE acoustic model was trained using the Kaldi toolkit (which can be 
downloaded from http://kaldi.sourceforge.net/), using a combination of corpora:

* Lwazi corpus [1]: the Afrikaans, English, Sesotho and isiZulu Lwazi corpora. 
                    This subset of the Lwazi corpus consists of 800 speakers,
                    and amounts to 25 hours of speech.
* SADE corpus  [2]: the entire SADE corpus, including an in-house subset that
                    will not be released in the public domain, was used. The 
                    corpus contains 24 hours of speech from 44 speakers
                    (a balance of Afrikaans, English, Sesotho and isiZulu male
                    and female speakers).
* Municipality names corpus:
                    this is an in-house corpus of spoken municipality names.
                    The corpus consists of 2h30m of speech from 24 speakers.

The acoustic models were trained using a recipe similar to the Kaldi Babel &
Wall Street Journal (WSJ) recipes; standard 3-state left to right triphone
hidden Markov models (HMMs) (with Gaussian mixture models (GMMs) as the
statistical model) were trained, with maximum likelihood linear transform (MLLT)
and feature-space maximum likelihood linear regression (fMLLR)
speaker-specific transforms. The features employed are standard Mel-frequency
cepstral coefficients (MFCCs) with cepstral mean normalization (CMN) per
speaker. Frames are spliced together, and linear discriminant analysis (LDA)
is used to reduce the dimensionality of the features to 40. These models were
then used to create alignments, which were used to initialize training of a deep
neural network (DNN) with 3 hidden layers.

-------------------------------------------------------------------------------

CORPUS DIRECTORY/FILE STRUCTURE:

sade.v1.0/trunk/unimrcp/usr/local/unimrcp/data/nnet_models_2014-07-29
├── conf
│   ├── global_cmvn.stats
│   ├── mfcc.conf
│   ├── online_cmvn.conf
│   ├── online_decoding.conf
│   └── splice.conf
├── graph_FREESTATE_EASTERNCAPE
│   ├── HCLG.fst
│   ├── word_boundary.int
│   └── words.txt
├── graph_LOCALDISTRICT
│   ├── HCLG.fst
│   ├── word_boundary.int
│   └── words.txt
├── graph_MUN
│   ├── HCLG.fst
│   ├── word_boundary.int
│   └── words.txt
├── graph_YESNO
│   ├── HCLG.fst
│   ├── word_boundary.int
│   └── words.txt
├── license.txt
├── README.acoustic_model.v1.0.txt
└── tri6_nnet
    ├── final.mat
    └── final.mdl

6 directories, 20 files

-------------------------------------------------------------------------------

ADDITIONAL DOCUMENTATION:

[1]     E. Barnard, M. Davel, and C. van Heerden, "ASR corpus design for 
        resource-scarce languages," in Proc. Interspeech, Brighton, UK,
        Sept. 2009, pp. 2847–2850.

[2]     J. W. Thirion, C. van Heerden, O. Giwa, and M. H. Davel, "The South
        African Directory Enquiries (SADE) corpus," to be submitted.

[3]     Charl van Heerden, Marelie Davel and Etienne Barnard, "Performance
        analysis of a multilingual directory enquiries application", in Proc. 
        Annual Symp. Pattern Recognition Association of South Africa (PRASA), 
        pp 258-263, Cape Town, South Africa, November 2014.

-------------------------------------------------------------------------------
